Model Selection

Multilingual speech recognition

# Multilingual speech recognition

Ipa Whisper Base

A multilingual speech recognition model fine-tuned based on Whisper-base, supporting International Phonetic Alphabet (IPA) output

Speech Recognition

Safetensors Supports Multiple Languages

Canary 1b Flash

NVIDIA NeMo Canary Flash is a family of multilingual multitask models that achieves state-of-the-art performance across multiple speech benchmarks. Supports automatic speech recognition and translation tasks in four languages.

Speech Recognition Supports Multiple Languages

Faster Whisper Large V3 Turbo Int8 Ct2

This is the CTranslate2 converted version of OpenAI's Whisper-large-v3-turbo model, employing INT8 quantization technology, primarily designed for efficient speech recognition tasks.

Speech Recognition Supports Multiple Languages

Mahadhwani Pretrained Conformer

A pre-trained Conformer encoder model based on self-supervised learning, supporting automatic speech recognition tasks for 22 scheduled Indian languages.

Speech Recognition

Whisper Large V3 Distil Multi7 V0.2

A distilled multilingual Whisper model supporting automatic speech recognition for 7 European languages with code-switching capability

Speech Recognition

Transformers Supports Multiple Languages

Whisper Large V3 Turbo

Whisper large-v3-turbo is a distilled version of OpenAI Whisper large-v3, with the decoder layers reduced from 32 to 4, significantly improving speed while slightly reducing quality.

Speech Recognition Supports Multiple Languages

Whisper is a Transformer-based encoder-decoder model for speech recognition and translation tasks, supporting multilingual processing.

Speech Recognition

Whisper Small Uz En Ru Lang Id

A fine-tuned multilingual speech classification model based on Whisper-small, supporting speech recognition and classification for Uzbek, English, and Russian.

Audio Classification

Transformers Supports Multiple Languages

Owsm Ctc V3.1 1B

OWSM-CTC is an encoder-only speech foundation model based on hierarchical multi-task self-conditioned CTC, supporting multilingual speech recognition, speech translation, and language identification.

Speech Recognition Other

Whisper Large V3 Japanese 4k Steps Ct2

This is a CTranslate2 converted version of the OpenAI Whisper large-v3 model, specifically fine-tuned for Japanese with an additional 4,000 training steps, supporting multilingual speech recognition.

Speech Recognition Supports Multiple Languages

Canary-1B is a multilingual multi-task model developed by NVIDIA NeMo, supporting automatic speech recognition and speech translation tasks in English, German, French, and Spanish.

Speech Recognition Supports Multiple Languages

Whisper Large V3 Ft Cv16 Mn

A speech recognition model fine-tuned on the Common Voice 16.0 dataset based on OpenAI Whisper Large V3

Speech Recognition

Multilingual Distilwhisper 28k

An improved multilingual automatic speech recognition model based on whisper-small, enhancing target language performance through CLSR module and knowledge distillation

Speech Recognition

Transformers Other

Faster Whisper Tiny

CTranslate2 converted version of OpenAI Whisper tiny model for efficient speech recognition

Speech Recognition Supports Multiple Languages

Faster Whisper Base

This is the CTranslate2 converted version of OpenAI's Whisper base model, designed for efficient speech recognition tasks.

Speech Recognition Supports Multiple Languages

Faster Whisper Medium

This is the CTranslate2 converted version of OpenAI's Whisper medium model, designed for efficient speech recognition tasks.

Speech Recognition Supports Multiple Languages

Faster Whisper Large V3

Whisper large-v3 is a large-scale multilingual automatic speech recognition (ASR) model developed by OpenAI, supporting speech-to-text tasks in multiple languages.

Speech Recognition Supports Multiple Languages

Whisper Large V3

Whisper is an advanced automatic speech recognition (ASR) and speech translation model proposed by OpenAI, trained on over 5 million hours of labeled data, with strong cross-dataset and cross-domain generalization capabilities.

Speech Recognition Supports Multiple Languages

Lang Id Voxlingua107 Ecapa

ECAPA-TDNN based spoken language identification model trained on VoxLingua107 dataset, supporting classification of 107 languages

Audio Classification Supports Multiple Languages

Faster Whisper Large V1

This is the CTranslate2 converted version of the OpenAI Whisper large-v1 model for efficient speech recognition tasks

Speech Recognition Supports Multiple Languages

Faster Whisper Large V2

This is the CTranslate2 converted version of OpenAI Whisper large-v2 model for efficient speech recognition

Speech Recognition Supports Multiple Languages

Faster Whisper Medium

This project converts the openai/whisper-medium model to the CTranslate2 model format, which can be used for efficient speech recognition.

Speech Recognition Supports Multiple Languages

Faster Whisper Base

The Whisper base model is an Automatic Speech Recognition (ASR) model developed by OpenAI, supporting speech-to-text tasks in multiple languages.

Speech Recognition Supports Multiple Languages

Whisper Large V2 Slovenian

This model is a speech recognition model fine-tuned on the Common Voice 11.0 Slovenian dataset based on OpenAI's Whisper Large-V2 model, with a word error rate of 13.83%.

Speech Recognition

Transformers Other

Whisper Large V2

Whisper is a pre-trained automatic speech recognition (ASR) and speech translation model, trained on 680,000 hours of labeled data with strong generalization capabilities

Speech Recognition Supports Multiple Languages

Wav2vec2 Xls R 300m Mixed

A speech recognition model fine-tuned on mixed-language datasets based on Facebook's wav2vec2-xls-r-300m model, supporting Malay, Singaporean English, and Mandarin.

Speech Recognition

Xlsr Wav2vec2 2

A fine-tuned speech recognition model based on facebook/wav2vec2-large-xlsr-53, supporting multilingual speech-to-text tasks

Speech Recognition

Armenian automatic speech recognition model based on wav2vec2-xls-r-2b architecture, supporting hy/hye language

Speech Recognition

Transformers Other

A Transformer-based end-to-end speech translation model specifically designed for French-to-English speech translation tasks.

Speech Recognition

Transformers Supports Multiple Languages

Xtreme S Xlsr 300m Fleurs Langid

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the GOOGLE/XTREME_S - FLEURS.ALL dataset for multilingual speech recognition tasks.

Audio Classification

Transformers Other

Xtreme S Xlsr 300m Minds14

A multilingual speech recognition model fine-tuned on the GOOGLE/XTREME_S - MINDS14.ALL dataset based on facebook/wav2vec2-xls-r-300m

Audio Classification

Transformers Other

Xtreme S Xlsr Mls Upd

A Polish speech recognition model fine-tuned on the GOOGLE/XTREME_S - MLS.PL dataset based on facebook/wav2vec2-xls-r-300m

Speech Recognition

Transformers Other

Wav2vec2 Base 10k Voxpopuli

A foundational speech recognition model pretrained on 10,000 hours of unlabeled data from the VoxPopuli corpus, supporting multilingual speech processing

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr 53 Demo Colab

This model is a speech recognition model fine-tuned on the common_voice dataset based on facebook/wav2vec2-large-xlsr-53, primarily used for robust speech event recognition.

Speech Recognition

Wav2vec2 Large Mt Voxpopuli V2

Facebook's Wav2Vec2 large model, pretrained exclusively on unlabeled data from the VoxPopuli corpus for Maltese (mt), suitable for speech recognition tasks.

Speech Recognition

Transformers Other

Wav2vec2 Base 100k Voxpopuli

A speech recognition base model pretrained on 100,000 hours of unannotated data from the VoxPopuli corpus

Speech Recognition

Transformers Other

Wav2vec2 Large 100k Voxpopuli

A speech recognition model pre-trained on 100,000 hours of unlabeled data from the VoxPopuli corpus, supporting multilingual speech representation learning

Speech Recognition Other

Wav2vec2 Large Xlsr 53 Demo Colab

This is an automatic speech recognition model based on the wav2vec2 architecture, specifically optimized for the Tamil language and supporting Nepali speech recognition tasks.

Speech Recognition

Transformers Other

Wav2vec2 Pretrained Clsril 23 10k

An audio pre-training model based on self-supervised learning, capable of learning cross-lingual speech representations from raw audio of 23 Indian languages

Speech Recognition

Asr Voxrex Bart Base

This is an automatic speech recognition model based on a sequence-to-sequence architecture, capable of converting speech into text.

Speech Recognition

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase